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Abstract 

We present a deterministic approximation algorithm to compute logarithm of the number of 'good' 
truth assignments for a random ^-satisfiability (fc-SAT) formula in polynomial time (by 'good' we mean 
that violate a small fraction of clauses). The relative error is bounded above by an arbitrarily small 
constant e with high probability 1 as long as the clause density (ratio of clauses to variables) a < 
o-u{k) = 2k~ l log/c(l + o(l)). The algorithm is based on computation of marginal distribution via belief 
propagation and use of an interpolation procedure. This scheme substitutes the traditional one based 
on approximation of marginal probabilities via MCMC, in conjunction with self-reduction, which is not 
easy to extend to the present problem. 

We derive 2fc~ 1 logfc(l + o(l)) as threshold for uniqueness of the Gibbs distribution on satisfying 
assignment of random infinite tree fc-SAT formulae to establish our results, which is of interest in its 
own right. 



1 Introduction 

Setup and Problem Statement. Given N boolean variables Xi, 1 < i < N, an M clause fc-satisfiability 
(fc-SAT) formula has the form F = Aj^Cj, where Cj = \l\_ x Zj t with literal Zj t being either xt for Xi 
for some 1 < i < N. An assignment x g {0, 1}^ of variables Xj, 1 < i < N satisfies clauses Cj if at 
least of one the k literals of Cj evaluates to be true. We will denote true by "1" and false by "0". For 
given F, E(x) denote the number of unsatisfied clauses of F under assignment x. Given (3 E R+ (called 
inverse temperature in statistical physics), define partition function as 

Z N ([3,F)= e ' m ~ K « 

xe{o,i} N 

Notice that Zn((3,F) weighs in favor of "good" assignments, i.e. assignments that satisfy more clauses. 
As j3 — ■> 00, Zn((3, F) becomes the number of assignments that satisfy (all clauses of) F. The partition 
function naturally arises as normalizing constant in the following probability measure on {0, 1}^, often 
denoted as Boltzmann distribution £Q related to F: for x £ {0, l} 1 
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/ \ 1 TT , 1 \ e 1 , 1 \ f 1 if x satisfy clause C 7 - , 

^■ Fto = M^il^0E) = z^W)' Where ^ = {e-P otherwise. ® 



*Laboratoire de Physique Theorique de l'Ecole Normale Superieure, Paris. Research is partially supported by European 
Union under the ip EVERGROW. Email: montanar@lpt.ens.fr 

fLIDS, MIT. Research is partially supported by NSF CAREER. Email: devavrat@mit.edu. 
Keywords: Random fc-SAT, Correlation Decay, Uniqueness, Gibbs Distribution 

1 In this paper, by term "with high probability" (whp) we mean with probability 1 — Ojv(l). 



We shall write //(•) = M/3,f(") whenever it will not be necessary to specify the formula and inverse 
temperature. We further denote by ( • } = ( • )p,F expectations with respect to the measure /x. 

In this paper, we are interested in random k-SAT formulas. These are generated by selecting M 
clauses independently and uniformly at random from all possible 2 fc (^f) fc-clauses. Specifically, let M 
scale linearly in N, i.e. M — aN for a G M + . 

The main motivation in this paper is to describe an efficient algorithm to compute a good ap- 
proximation of Zpf(P,F) for such random formulas. An important open conjecture is to show, that 
for any a, (3 G R+, under the probability distribution induced by random fc-SAT formula, the limit 
limjv^oo is log Z]\[ (/3, F) exists with probability 1. The analysis of our algorithm implies such a result 
for all finite (3, and a smaller than a critical value. 

Related Previous Work. The well-known threshold conjecture for random fc-SAT states that for all 
k > 2, there exists a c {k) such that for a < a c (fc) (resp. a > a c (k)) the randomly generated formula 
is satisfiable (resp. not satisfiable) with probability 1 as N — > oo. There has been a lot of interesting 
work on this topic, and a convergence of methods from different communities [3 El El- Due to space 
limitation, we will recall only some of the key relevant results. 

Friedgut Jl] established existence of a sharp threshold. More precisely, he proved that there exists 
a c (k,N) such that the satisfiability probability tends to 1 (to 0) if a < a c (k,N)(l — rf) (respectively 
a > a c (k,N)(l + 77)). While it is expected that limjv-,00 a c (fe, N) exists, it has still remained elusive. 
Recently, Achlioptas and Peres (HI established that a c (k,N) = 2 fc lnfc(l + o/-(l)) thus implying that 
a c (k,N) can be taken TV independent to first order for large k. 

The existence of limjv— >oo lim^— >oo jj log Z^(f3, F) with probability 1, for all a G R+ and k naturally 
establishes the threshold conjecture. More generally, the log-partition function at j3 — 00 provides 
detailed information about the satisfying assignments (computing it exactly is of course #-P complete) . 
In 7 a formula for the limit log-partition function was derived through the non-rigorous replica method 
from statistical physics. The existence of the N — > 00 limit was proved by Franz, Leone and Toninelli 8 
E] for even k and all values of a. These authors also provided an upper bound on lim;v->oo js log %n (P> F) ■ 
However evaluating the bound requires solving an a priori complex optimization problem, and a matching 
lower bound wasn't proved there. Talagrand established the existence of the limit and its value for 
very small value of (3 (depending on k). 

Overview of Results. In this paper, we essentially prove that the Boltzmann distribution (J2J is a 
pure state |y by establishing appropriate worst-case correlation decay for tree formulae. The approach of 
Talagrand [S] also crucially relied of proving correlation decay, albeit with different means. This resulted 
in a limitation to small values of (3 and thus leaving out interesting regime of large (3. 

An analogy can be drawn with the Markov Chain Monte Carlo (MCMC) approach to the approximate 
computation of partition functions (see, for example, work by Jerrum and Sinclair jlUp. In that case, 
the crucial step consists in proving an appropriate mixing condition ('temporal' correlation decay) for 
some Markov Chain. The same role is played here by 'spatial' correlation decay with respect to the 
measure ©. 

In this paper, we establish correlation decay for random fc-SAT formula for a range of a and all (3. 
This allows to estabilish that deterministic Belief Propagation algorithm provides a good approximation 
of the marginals with respect to the distribution (J2J), cf. Section El In the usual MCMC approach, 
marginals are used to approximate the partition function by recursively fixing the variables and exploit- 
ing self-reducibility. This cannnot be done in the present case because the reduced SAT formulae are not 
random anymore. Instead, we use interpolation in [3 1 to obtain log Zn((3,F) approximately (Theorem 
[T]l. The analysis of the approximation scheme implies the existence of the limit limjv^oo jr log Z/v(/3, F) 
(Theorem EJ. We hope that our novel approach for counting will find applications in other hard com- 
binatorial problems. Similar schemes were recently discussed by Weitz and Bandyopadhyay and 
Gamarnik 12 for counting independent sets approximately via deterministic algorithms. 

Finally, we show that the computation of the partition function leads to an estimate of the number 
of truth assignments that violate at most Ne clauses, for small e (Theorem 0J. As a byproduct, we 
obtain an asymptotically (in fc) threshold for uniqueness Gibbs measure on infinite fc-SAT tree formula 
(Theorem EJ). 
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Organization. Section [5] presents preliminaries and statements of the main results. The Section [3] 
describes the approximate counting algorithm and the proof of key Lemmas related to the correlation 
decay (or uniqueness) of Gibbs distribution on random tree fc-SAT. The Section 0] completes the proofs 
of all main results stated in Section We present direction for future work in Section 

2 Preliminaries and Main Results 

Given a and fc, define a*(fc) to be the smallest positive root of the equation n{a) — 1, where 

K(a) = k (k-l)a(l-\ e- k ^ \ (3) 

For k = 2,3,4,6, the a*(fc) is approximately 0.58216,0.293,0.217,0.16670. Asymptotically, a„(fc) = 
2fc~ 1 log A; ^l + O f l0 k)gfc fc j)- Now, we state the main result of this paper about approximating loga- 
rithm of partition function. 

Theorem 1. Given e > and a < a*(k), there exists 5' > and a polynomial (in N, independent of 
e) time algorithm that computes a number F) (the input being [3 €M and a satisfiability formula 
F) such that the following is true. If [3 € [0, N s } and F is random k-SAT formula with N variables and 
M = Na clauses, then, with high probability, 

§{(3, F) {I -s)< log Z N {(3,F)<$>{(3,F) (1 + e). (4) 

The proof of Theorem^requires us to prove uniqueness of Gibbs measure for the model © on infinite 
tree random fc-SAT formulae. To state this result, we first need some definitions. An appropriate model 
for tree random fc-SAT, T*(r) is described as follows: For r — 0, it is the graph containing a unique 

variable node. For any r > 1, start by a single variable node (the root) and add I = Poisson(fca) clauses, 
each one including the root, and k — 1 new variables (first generation variables). For each one of the 
I clauses, the corresponding literals are non-negated or negated indipendently with equal probability. 
If r > 2, generate an independent copy of T*(r — 1) for each variable node in the first generation and 
attach it to them. By construction, for any r' < r the first r' generations of a tree from T*(r) are 
distributed according to the model T*(r'). As a consequence, the infinite tree distribution T*(oo) is also 
well defined. In what follows, we denote the root of T*(-) as 0. Let [i denote the Gibbs distribution 
on random formula on T»(r) (cf. (J2J) and Ho\ r (xo\x r ) be the conditional distribution of root variable 
conditional to the assignment of r-th generation nodes of T*(r) according to x r . The key property for 
most of the results of this paper is that of correlation decay with respect to random tree formulas T*(-). 

Definition 1. Given a, (3 6 K+ and k > 2, the Gibbs distribution defined by 0) on the random tree 
T*(-) is unique with exponential correlation decay if there exists positive constants > 0, such that 

<Ae" 7r , (5) 

for any r > 0. The uniqueness threshold a u (k) is the supremum value of a such that the above condition 
is verified for any (3 G [0, oo] . 

The property defined here is a lot stronger than the usual notion of correlation decay, which only 
requires | \fio\ r ( ■ \x r ) — /io|r( ' Itv ~~ > ^ as r ~~ * 00 a l most sure ly- Let a' u (k) denote the threshold for 
this weaker property. To the best of our knowledge, nothing has been known about the precise values of 
a u (fc), a' u (k) or the relation between them other than trivial lower bound from percolation threshold of 
0(fc~ 2 ). We establish the precise asymptotic behavior of a u (fc) and show that a n (k) — a' u (fc)(l + 0^(1)) 
as stated below. 

Theorem 2. For the Gibbs distribution 0) defined on T*(-) as above, 
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Though algorithmically we obtain approximation of log Zn(/3, F), it is possible to establish the 
convergence of ■h log Zn(/3, F) with probability 1. Before stating this result, we need some definitions. 
In what follows, define function / : M. k ~ 1 — > K as 

1 f 1 - e _/3 fc_1 1 
/(zi,...,ic fc -i) = --log|l- 2k _ x JJ(l-taphgi)| . (7) 

Let V denote the space of probability distributions on the real line K. Define functions S,Si,S% :T> — > V 
as follows: Given /j, G T>, define random variable u — f(hi, . . . , hk-i) where h\, . . . , /ife-i are i.i.d. with 
distribution fi. Define distribution of u as Si(p). Given a distribution v G Z>, let random variable 

ho = 53o=i u a ~ Sfe=i where £ + , £ _ are independent Poisson random variables with mean ka/2 and 
u a ,Ub be i.i.d. with distribution v. Let distribution of ho be denoted by S^^)- Define S 1 = Si o S , 2 . 
Now, we state the result. 

Theorem 3. Given k, let a < a*(fc) and /3 G [0, oc). Then, the function S : T> —> T> as defined above 
has unique fixed point, say \x* . Let v* — S^^t*). Then, 

±-\ogZ(0,F N )^<f>([3), (8) 



where cj>(/3) = -fcaElog[l + tanh /i tanh u] + aElog |l - ^(1 - e~ p ) ]J(1 - tanh/ij)| + (9) 

r £ + i- e+ i- \ 

+Elog^ ]^(l+tanhu+)J|(l-tanhiir) + ]J(l-tanhu+)]^[(l + tanhur) \ , 

[i=l t=l i=l i=l J 

where u,uf are i.i.d. with distribution [i* , h,hj are i.i.d. with distribution v* and £± are Poisson of 
mean ka/2. 

Finally, define S(£, F) to be the number of assigments that violate at most £ clauses. The next 
result formalizes the relation between the approximation of Zn(/3, F) and counting the number of truth 
assignments that violate a small fraction of clauses. 

Theorem 4. For any k > 2, e > 0, and a < a*(fc) there exists A, C > 0, a > such that the following 
is true. If F is a random k-SAT M — Na clauses over N variables, and = A log 1/e, then 

| log H( Ne, F) - $(/3, F) \ < NCe a , (10) 

with high probability, where <$>(/3,F) as defined in Theorem^ 



3 Algorithm and Key Lemmas 
3.1 Algorithm 

We first define a factor graph Gf for a given formula F: each variable is represented by (circle) variable 
node and each clause by a (square) clause node with an edge between a variable and a clause node only 
if corresponding variable belongs to the clause. The edge is solid if variable is non-negated and dashed if 
variable is negated. The Belief Propagation (BP) algorithm is a heuristic (exact for tree factor graphs) 
to estimate the marginal distribution of node variables for any factor graph. Specifically, we will use BP 
to approximately compute marginals of the distribution J3J). 

We will quickly recall BP for our specific setup. We refer reader to see ^3 ^3 for further details 
on the algorithm. BP is a message passing algorithm in which at each iteration messages are sent from 
variable nodes to neighboring clause nodes and vice versa. The messages at iteration t+1 are functions of 
messages received at iteration t. To describe the message update equations, we need some notation. Let 
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da denote the set of all variables that belong to clause a. If variable Xi is involved in clause a as literal 
z (either z = Xi or z — Xi), then define d+i(a) as the set of all clauses (minus a) in which Xi appears as 
z. Similarly, <9_i(a) denotes the set of all clauses in which x$ appears as z. Let {h^X }, {u^-m} denote 
the messages (ideally they are half log-likelihood ratios) that are passed along the ddirected edges i — > a 
and a —>■ i respectively at time t, then the precise update equations are 

h^= E E "ii^/a^ie^v}), (id 

beO+i(a) be9_i(a) 

where the function /( • ) has been defined in Eq. Q. We shall assume 2 that the update equations are 
initialized by = an( i algorithm stops at iteration t max which is equal to the diameter of Gf- Let 
(hi^ a ,u a ^i) be messages passed in the last iteration of BP. Using these messages, an estimate of the 
probability that a clause is satisfied can be obtained as follows. Let E a (x da ) be the indicator function 
for the a-th clause not being satisfied. As mentioned above, hi^ a is thought of as half log-likelihood 
ratio for i satisfying a and i not satisfying a, in the absence of clause a itself. A little algebra then shows 
that the BP estimate for the expectation of E a (x_ da ) is 

_ J2x E a(xda) ^p{-PE a (x da ) + hi^ a a ai (Xi)} 
( a( - aJ)BP - Z^ a eM-pE a (x da ) + h^ a a al ( Xl )} ' (12) 

where cr ai (x) = +1 if setting Xi = x satisfies clause a, and = — 1 otherwise. We further introduce the 
number of clauses violated by x, E(x) = J2 a E a (xg a ), and its BP estimate {E(x))bp = ^ a (Kfe))BP- 
Given (3 > 0, we let ft = i/3/A 2 , for i = 0, . . . , n = N 2 . Then, 

lagZ(J3,F) = log Z(0, F) + E log = N log 2 + E log(e~ Ag ^ ), , (13) 

i=0 ' i=0 

where A = /3;+i — Pi, and (•); is a shorthand for expectation with respect to the measure H^f{ • )• 
The above expression is difficult to evaluate. However, due to A being small the (~AE(x)) is a good 
estimate of \og(e^ AE ^)i. Hence, define the algorithm estimate as 

n— 1 

$(/5,F) = 7Vlog2-EA(i?(x))BP, l , (14) 

i=l 

where the subscript in ( • )bp,i emphasizes that the BP computation must be performed at inverse 
temperature ft. 

3.2 Key Lemmas 

Before presenting useful Lemmas, let us mention a few facts. Given factor graph Gf and variable node 
i, 1 < i < N, let Bj(r) denote subgraph induced by the set of all variable that are within shortest path 
distance r of node i (distance between two variables sharing a clause is unit). Analogously, for a clause 
node a, B a (r) is the union of B^(r) with i running over the variables involved in a. Let A be subset 
of variable nodes. Then, let x A denote an assignment to the corresponding variables. Given two such 
subsets A, B C [N] and assignments x A , x B , let ^a\b{-La\^-b) ^ >e the conditional probability under the 
distribution of the variables in A, given assignment x B on B. The following is a well-known result 
about BP algorithm (see |17|L 

Lemma 1. Given a clause a and r, let B a (r + 1) be a tree. Let U = B Q (r) and V = [N]\U . Then 

\(Ea(£0a)) - ( E o.(^da)}Bp\ < SUp Ma a | V ( • |y ) - Hda,\v{ ' Uv) ^ . (15) 



y.z 



TV 



< (E a (xa a )), (E a (x da )) B p < max <^ E E a(x)^ d a\v(x da \z v ) \ (16) 



In fact an arbitrary initial condition and a smaller number of iterations wouldn't change our main results. 
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Figure 1: Pictorial representation of the recursion (|21|) on the factor graph Gp\ filled squares represent 
function nodes and empty circles variable nodes. Dashed edges correspond to negations. 



Next, we present a known result about locally tree-like structure of random fc-SAT formula (an 
analogous result concerns the local structure of sparse random graphs). 

Lemma 2. Consider k > 2, a £ [0, oo) and a random k- SAT formula F with clause density a. For 
r > 0, let Bj(r) be the ball of radius r centered at a uniformly random variable node i. Let S(r) be an 
r -generation tree with distribution same as T*(r) (with the same values of k and a). Then, there exists 
A, p (dependent on a,k) such that 

Ae pr 



P{B 4 (r)e .}-P{S(r)e .}|| TV <__. (17) 



Lemma 3. Let a*(k) be the smallest positive root of the equation n(a) — 1, with k(o) defined as in 
Eq. 0). Then a* (A) < a u (AO- 
Proof: Given an r-generations tree formula F, consider an edge i — > a directed toward the root 
and the subtree rooted at i and not containing a. Denote by » a ( • ) the marginal distribution of Xi 
with respect to the model associated to this subtree, and let /ij_» a G [— oo, oo] be the corresponding 
log-likelihood ratio 

I f Pi^a{xi satisfies a) \ 



2 [ fii^ a {xi doesn't satisfy a) 

Analogously, given an edge a — > i, we consider the subtree rooted at i and containing only a among the 
clauses involving i. We denote by /U a — >i( • ) the corresponding marginal distribution at i, and let 

1 f fJL a ^i{xi satisfies a) \ 

Ua^i = ~ log <^ — r > . (19) 

2 { fi a ^ t i(Xi doesn t satisfy a) ) 
It is easy to show that these log-likelihoods satisfy the recursions 3 

h i— a = Ub ^i ~ Y Ub ^3 ' Ua ^ 1 = f({ h i->a>j e da\i}) , (20) 

b£d + j(a) bed-j(a) 

with the function /(•) being defined as in Eq. Q. For the calculations below, it is convernient to 
eliminate the /ij_> a variables, to get 

U a ^i = f\ ^2 U b^jl~ U b^Jli ■•■ ; u b^J k -i- Y U b~*j k -l \ I ( 21 ) 

ybed+j^a) bed-j^a) b£d + j k _ 1 (a) bed-j^a) J 



3 The reader will notice that these coincide with the BP update equations, cf. Eq. Hill , which are known to be exact on 
trees. 
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where we denoted by ji, . . . , jk-i the indices of variables involved in clause a (other than i). A pictorial 
representation of this recursion is provided in Fig. ^ 

Notice that the above recursions hold irrespective whether one considers the unconditional measure 
/i(-), or the conditional one (i(-\x r ). What changes in the two cases are the initial condition for 
the recursion, i.e. the value of /ii_> a associated with the variables i at the r-th generation. For the 
unconditioned measure ('free boudary condition'), the appropriate initialization is /ii_> a — 0. If one 
conditions to x r , h^ a = +00, or = —00 depending (respectively) whether Xi satisfy clause a or not. 

In the rest of the proof, we shall think always to the conditioned measure /x( ■ \x r ). As a consequence, 
the log-likelihoods are, implicity, functions of x r : u a ^i = u a ^i(x r ) (indeed of the restriction of x r to 
the subtree rooted at i, and only containing a). We then let 

U a ->i = max U a ->i(x r ) , U a ^t = min u ^>fer) • (22) 

— r —r 

In the case /3 = oo, the maximum (minimum) is taken over all boundary conditions x r , such that the 
sub-formula rooted at i admits at least one solutions, under the condition x r (there is always at least 
one such boundaries). We further let A a _^j = u a ^i — ILa^i ^ 0- 

Consider a random tree distributed as T*(r), conditioned to the root having degree 1, i.e. to the root 
variable being involved in a unique clause, to be denoted by a. Let A( r ' = A a ^i be the corresponding 
log-likelihoods interval. We will show that EtanhA^- 1 < e~ 7r for some positive constant 7. Before 
proving this claim, let us show that it indeed implies the thesis. Denoting by d+0 the set of clauses in 
which the root is involved as the direct literal, and by d-0 the set in which it is involved as negated, we 
have 







1, 


TV 




2' 



M0|r(- \x r ) ~Ho\r(- kr)|lxv = o I tann h o(x r ) - tanh h (z r ) I , (23) 



ho(x r ) = Y u a^o(x r ) - Y u a^o(x r )- (24) 

a£d + aed-Q 

Since x t— > tanh(x) is monotonically increasing in x, we have 

||M0|r( ' \%r) ~ M0|r(" Ur)|| TV - \{ tann ^o - tanhh, } , (25) 

^ = Y ~ Y ^-'O > k Q = Y ^-o - Y ■ ( 26 ) 

aGd+0 aed-0 aed+0 a£9-0 

Using the elementary properties tanh a; — tanhy < 2tanh(x — y) for any x > y, and tanh(x + y) < 
tanh a: + tanhy for x, y > 0, we get 



|j"0|r(-kr)-M0|r(-k r )| 



TV 



< tanh J Y A ^o \ < Y tanh A <^° • ( 27 ) 

{a£d0 J aedO 



We can take the maximum over boundary condition and the expectation with respect to the tree en- 
semble. Recalling that \d0\ is a Poisson random variable of mean ka, we get 

Emax \ \/M)\r( • \%. r ) ~ Mo|r( " k r )| Itv - kaEtanh A M , (28) 

which implies the thesis upon taking A — ka. 

We are now left with the task of proving Etanh A( r ) < e~ 1T . It is easy to realize that f(x\, . . . , £fe_i) 
is monotonically decreasing in each of its arguments. Therefore Eq. (|21|l yields the following recursion 
for upper/lower bounds 



ii 



a — >% 



= /[ Y y*^h~ Y ■■■ ; Y Y J . ( 29 ) 

bed+j^a) bed-j^a) bed+j h _ 1 (a) b£d-j kl (a) 
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together with the equation obtained by interchanging u... and u.... By taking the difference of these two 
equations, we get 

A«_* = /fe; . . . -Ak-x) - f(hv, ■ ■ ■ ;hk-i) , (30) 

where we defined ^ = Ebea +3z (a) 3*6-i< "Sbee-i^o) ^>-*ii and ^ = ^bed+j^a) -£bea_ Ji( a) St-.* 
(obviously > hi). 

Suppose now n out of the k — 1 variables , . . . , a; Jfc _ 1 are pure literals, let's say variables Xj 1 , . . . , Xj n . 
This means that d-j^a), . . . d-j n (a) = 0, and therefore, since the loglikelihoods uj-y are non-negative 
(because / is non-negative), hy, . . .h n > 0. It is an easy exercise of analysis to show that, if x\, . . . , x n > 
0, 

0<-^(*i,...,* fc -i)<^. (31) 
Therefore, by the Mean Value Theorem 

Next we take the hyperbolic tangent of both sides, and use again tanh(x + y) < tanhx + tanhy, for 
x, y > to get 

^ fc-i 

tanhA^i < —J2 ta^Af,^ • (33) 

Finally we take expectation of this inequality. In order to do this, we recall that n is just the number 
of pure literals among Xj 1 , . . . Xj le _ 1 . In our notations this can be written as n — 53z=Ti I(l^-i;( a )l = 0)- 
We further assume that i is the root of a tree from T*(r + 1), r > and therefore A a _+i is distributed 
as A( r '. Furthermore the differences A;,—^ will be distributed as A( r+1 ). We thus obtain 

EtanhA^ 1 ) < E J * (a) w £ £ tanhA« 1 = (34) 

[l=l l=lbedji(a) ) 

= (fc - 1) E | } {E 2- I (l a -^=°) }"' 2 E tanh A^ . (35) 

The expectations over \d—j\ are easily evaluated by recalling that these are inpependent Poisson 

random variables of mean fea/2. One finally obtains E tanh A( r+1 ) < «(a)Etanh A^. The thesis follows 
(with 7 = —log k(o)) by noticing that EtanhA^ -' < 1, and recalling that n{a) < 1 for a < a*(k). □ 

Next, we state result about the error in expectation w.r.t. to BP estimate in a clauses being satisfied 
or not. To obtain bound in the error of BP estimate of (E a (x)), we need to study the error in estimation 
of the joint distribution of k variables in a clause. For this, we first choose a clause at random and treat 
all of its k variables as root of k independent rooted random trees (of suitable depth r) as before. Note 
that, this asymptotically does not bias the distribution of the original random formula as this process 
tilt the original distribution by at most 0(l/N). 

To this end, let x r be an assigment for the r-th generation variables. We shall denote by (■) < - r ' ) 
the expectation with respect to the graphical model J2J) associated to a formula constructed as follows. 
First we generate a uniformly random clause over variables x±, . . . , Xk- Then we sample k independent 
trees according to T*(r) and root them at x\,. . . ,Xk- We let ( ■ )^ be the corresponding conditional 
expectation, given the assignment to the r-th generation. 

Lemma 4. Let k > 2, a < a*(fc) and [3 € [0, oo]. Then there exist two positive constants A, 7, such 
that 



E max 



(E a (x))£-(E a (x))£ <Ae-*-. (36) 



S 



Proof: Denote by x da = {xi, . . . , Xk} the zeroth generation variables, by T\, . . . , T k the tree 
factor graphs drawn from T*(r) and rooted, respectively, at variable nodes 1, . . . , k. We then denote by 
Hi(xi\x r ), i € {1, . . . , k} the conditional distribution for variable Xi with respect to the model associated 
with the tree Ti. We also let hi(x r ) be the associated log-likelihoods (defined analogously to Eq. (|18Jl ). 
and = max^ hi(x r ) (hi — min x hi(x_ r )) be their maximum (minimum) values with respect to the 
boundary condition. 

It is not hard to show that {E a (x)}^ = g{hi(x r ), . . . , h k (x r )) where the function g : R k -> R is 
defined as follows 

, , e^ntjj^tanh^) 

g(Xi, . . . ,Xk) = ; : ■ 37 

l-a-e-^n^i^l-tanhxO 

Since g(x\, . . . ,x k ) is monotonically decreasing in each of its arguments, we have 



E max 



(E a (x))P - (E a (x))^ <E{g(h 1 ,...,h k )-g(h 1 ,...,h k )} , (38) 



where the couples (h l7 hi), ... , (h k , h k ) are i.i.d.'s and distributed as (h Q , ho) in the proof of Lemma [31 
cf. Eq. (|26|l . In particular, proceeding as in that proof, we deduce that Etanh(/i^ — hi) < Ae~ ir . We 
are left with the task of proving that this implies an analogous bound on the right hand side of Eq. (|38|l . 

To this end, we first consider a single variable function g : R — > R with < g(x) < 1 and —1 < 
g'{x) < 0. Then 

E{<?(Zh) -9(hi)} < — h 1 >A} + E{(h~i — h^ l(h~i — h-i < A)} < (39) 

1 - A 

< — Etanh(^i - hj) + — E{tanh(/ii - £J I(hi - h x < A)} < 

tclllll Za tcLIlll Za 

< 1 + ^ Etanh(fei - h,) . 
tanh A 

The proof is completed by writing E{g(h 1 , . . . , h k ) — g(hi, . . . , hk)} — Si=o — ffii^i)} where 

(ji(x) = g(h\ ■ ■ ■ hi— i, x, h i+1 , . . . ,h k ) and noticing that —1 < ^ < (the last statement is proved in 
the appendix) □ 

Finally, a result that puts together the above observations to derive the net error in BP estimation. 

Lemma 5. Let k > 2, a < a*(fc) and (3 £ [0, oo]. Then there exists two positive constants C and 6 < 1 
such that for any N, 

E\(E(x))-(E(x)) BP \<CN 5 . (40) 
Proof: By linearity of expectation and using LenxmaU we get 

E\(E(x)) - (E(x)) B p\ < ME\(E a (x)) - (E a (x)) BP \ < ME (max (E a {x))^ - (E a (x))P ) . (41) 



We would like to apply Lemma 0] but the expectation in the last expression is taken with respect to 
the formula F drawn from the random fc-SAT ensemble, instead of the tree model T*(r). However, 
the quantity in curly brackets depends only of the radius r neighborhood B a (r) of vertex a in Gf- 
Furthermore is non negative and upper bounded by 1. We can therefore apply Lemma [3 and 01 to upper 
bound the last expression by (here E^ denotes expectation with respect to the tree ensemble) : 

M 1 1 P{Bi(r) € • } - P{S(r) G ■ } | | TV + A/E f jmax | (E a - (E a | J < (42) 



< Aa e pr + NA'a e 



The proof is completed by setting r — logiV, which yields Eq. l|4T?|) with 8 = □ 
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4 Proofs of Theorems 



4.1 Proof of Theorem [T] 

Clearly, the running time of algorithm described in Sectionals 0(N A ) as total number of BP runs are 
0(N 2 ) and each BP run takes O(N) iterations or 0(N 2 ) serial operations. Now, we'll prove Eq. J3J. 

Using te existing lower bounds on a c (k,N) (see jH] and references therein), it is not hard to show 
that < a c (k,N)(l — if) for some 77 > all k > 2 and N large enough. By definition, for 

a < a c (k,N)(l — 17), /3 e [0, 00] there exists a constant C(a) > such that log Z(J3,F) > C(a)N log 2 
whp. This follows from the following two facts for appropriate C(a): (1) at least C(a)N variables do not 
appear in any clause whp and (2) at least one solution is satisfying assignment whp as a < a c (k, N)(l—rj). 
Thus, there are at least 2 C ^ N satisfying assignment, whence Zpf(j3,F) > 2 C ^ N . Given this, it is 
sufficient to show that |logZ(/3, F) — $>(/3,F)\ < Ne w.h.p. for any e > and N large enough. 

Now, Eqs. (O and {T1J imply that 



\]ogZ(p,F)-$(J3,F)\ < ^|log(e- Afi ^) l + A(S( ; r)}BP, 

»=o 

n-1 

■ £ |log(e- AE (£)) i + A (E(x))i\ + £ A|(%)) 8 - (25Ge)>bp,<| . (43) 

4=0 



Consider the first term in (|43|l : for any non-negative random variable X, log(e x ) < (e x ) — 1 < 
(I - X + X 2 ) — 1 < -(X) + (X 2 ). As a consequence, we obtain 



n-1 



i=0 



\log(e- AE( % + A (E(x))i < A2 (^fe) 2 >, < PAsup(E(x) 2 ) l < N 2S ' a 2 , 



(44) 



where we used (3 < N s , A = (3/N 2 < N 5 - 2 and < E(x) < Na. If we choose 5' < 1/2, this 
contribution is smaller than Ne/2 for all N large enough. 

Now, the second term in Eq. (|43|l : the bound H4Q(I holds for any (3 in the compact region [0,oo]. 
Furhter, the left hand side is uniformly bounded (in terms of N) and continuous in /3. Hence, there 
exists a C so that the bound 140fl holds uniformly for j3 G [0, 00]. This will imply that 



^AE|(£;(x)) i -< J B(x))BP,i| < PCN S <CN 



6+6' 



(45) 



i=0 



Choosing 8' £ (0, 1 — 8) and Markov inequality will imply that the second term is also bounded above 
by Ne/2 whp. This completes the proof of Theorem ^ D 



4.2 Proofs of Theorems [H H and H 

Due to shortage of space, they are moved to Appendix El 

5 Discussion and Future Work 

We presented a novel deterministic algorithm for approximately counting good truth assignments of 
random fc-SAT formula with high probability. The algorithm is built upon the well-known Belief Propa- 
gation heuristic and an interpolation method for the log-partition function. In the process of establishing 
the correctness of the algorithm, we obtained the threshold for uniqueness of Gibbs distribution for ran- 
dom /c-SAT formula as 2fc _1 log/c(l + o/-(l)). This result if of interest in its own right. 

We believe that our result can be extended to a reasonable class of non-random /c-SAT formula. We 
also believe that the approximation guarantees of Theorem ^ should hold for any (3 € [0, 00]. 
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A Proof Sketches: Theorems [2J, [3] and |U 

Due to space limitations, we only provide sketch of proofs for Theorems |5J [3] and 

Proof sketch for Theorem [2j By using the definition «;(«*) = 1 (with n(a) being defined as in 
Eq. 0), it is easy to show that a»(fc) = 2fc _1 logfc{l + 0(loglogfc/logfc)}. To complete the proof, 
we need a (asymptotically in k) matching upper bound. In order to obtain such an upper bound, we 
consider the case (3 = oo, i.e. only satisfying assignments have positive weight. Consider a tree formula 
which is distributed as T»(r). Let P r be the probability that there exists two boundary conditions 
xjr°\ 2-r j such that the root takes values, respectively, or 1 in all the satisfying assignments with the 
respective boundary conditions. Clearly for the Gibbs measure to be unique (or have correlation decay) 
in the sense of Definition (but also in the weaker sense correspondint to the threshold a' u (k)), it must 
be that P r — > as r — ► oo. Hence, if we establish that for a > 2fc~ 1 log/c{l + 0(loglogfc/log/c)}, there 
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exists such boundary conditions with positive probability, then the proof will be complete. Next, we do 
that. 

For this, consider a tree from T*(r) with the root having degree 1. Given such a tree, let p r be the 
probability that there exists a boundary condition x r , such that the root variable is the only variable 
that satisfies the only clause in which it belongs (recall that the root variable has degree 1) for all 
possible satisfying assignments with the given boundary condition. If P r — > 0, then p r — > 0. To prove 
this claim, assume by contraddiction that p r remains bounded away from zero (say p r > p > 0) and 
consider an tree from T*(r) (without conditioning). With finite probability the root belongs to two 
clauses in which it appears, respectively, directed and negated. With probability at least p 2 > 0, for 
each of the corresponding subtrees there exists a boundary condition that fixes the root variable to be 
(respectively) directed or negated. By extending arbitrarily this boundary conditions to the full tree, 
we obtain the desired Xr , . 

It turns out that p r can be determined recursively. Set po = 1 and p r +i = {1 — exp(— kap r /2)} k ~ 1 . 
Recursively, p r —* as r — > oo only if a < a*(k), where a*{k) for the above recursion (with little bit 
of algebra) evaluates to a*(k) — 2fc _1 logfcjl + (9(loglogfc/logfc)}. This completes the proof sketch of 
Theorem |21 

Proof sketch for Theorem|3J First notice that, if F and F 1 differ in a single clause, then | log Z(j3, F) — 

log Z ((3, F') | < 2/3. Hence, by application of Azuma-Hoeffding's inequality, it follows that \\ogZ — 
ElogZ| < NS with probability at least 1 - e - NC s , for some C/3 > for any j3 G [0, oo). Given this, to 
obtain the almost sure convergence as in JHJ, it is sufficient to prove that limjv^oo A r ~ 1 E<I>(/3, F) = </>(/?), 
in light of Theoremnjand Borcl-Cantclli's Lemma. 
To do so, first we need to establish that 

lim —E{E(x)}BP,0 = aEg(h 1 ,...,h k ), (46) 

iV— >oc iv 

where g is defined as in Eq. I|37f) : the random variables hi, . . . , hk are i.i.d. with distribution v* that is 
fixed point of operator S as defined in the statement of Theorem [3J We claimed that the fixed point is 
unique for S. To justify this claim, first note that the image of S is contained in the space of distributions 
supported on [0,(3/2], call it Dp, which is a compact space with respect to the weak topology. Being 
continuous on Dp, S admits at least one fixed point in it. Moreover, the contraction condition implied by 
the correlation decay (proved as a part of Theorem[2l implies the attractiveness as well as the uniqueness 
of the fixed point of S. 

Once we establish existence of the unique fixed point, the (|46ll follows from Lemma [5] and correlation 
decay established in Theorem|2 Now, by integrating Eq. over (3 and observing that (3i + i~(3i — (3/N 2 
(hence integration error is negligible at scale 1/-/V) one gets 

lim N^E $(/?, F) = log2 - a / E^ 5 (%, ...,h k ) d/3' , (47) 

where a subscript has been added in Epi to stress that the fixed point distribution has to be taken at 
inverse temperature f3' . The proof of Theorem [3] is completed by showing that the integral on the right 
hand side of the last equation is given by <j)((3) as in Eq. In fact, by taking the derivative of this 
expression wrt (3, one gets a contribution coming from the explicit (3 dependence, which evaluates to 
— aEg(h\, . . . , hk), and one from the (3 dependence of the fixed poit distribution, that can be shown to 
vanish. 

Proof Sketch of Theorem gj For the ease of notation, let Z{(3) = Z N ((3,F), H(C) = H(C,F) and 
U{(3) = (E(x))i3 tF . Because of Theorem □ it is sufficient to prove that | logS(TVe) - logZ(/3)| < Ne a 
whp. This follows from two inequalities. 
First inequality. For any £ > 0, 

Z<fi) = Yl e~^+ ]T e- pE & > e-«H(C). (48) 

x:E(x)>( x:E{x)<( 
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Second inequality. For any £ > and using the first equality in 1481) . we obtain 

x:E(x)>C 

Equivalently, Z((3)n(E(x) < C) < 2(C)- Now, take C = 2t/(/3) then, we get using Markov's inequality 

l x{E(x)<2U(P)} > l-|^y = \ (49) 

^From l|4*g|) and lj4*§)) . we obtain 

logZ(/3)-log2 < log 3(21708)) < logZ(/3) + 2pU{p). (50) 

The next sep consists in controlling U((3) at large (3. Arguing analogously to the proof of Theorem 
Qone can show that there exist constants Ci, C 2 , C3, a > such that, for any /? S [0, 00], NC\e~ 2 ^ < 
U(/3) < NC 2 e~ bp + C 3 N S whp. 

Fix Pi in such a way that 2Cie _2/31 = e. Then 2U{Pi) > Ne whp. By the upper bound in Eq. ifBUjl 
and monotonicit of 5(C), we get 

logS(TVe) < \ogZ{p x ) + 2p x U{Pi) < logZ(pi) + 2p 1 NC 2 e~ b ^ + 2PiC 3 N s . (51) 

Using the definition of Pi , which gives Pi = \ log , we get that there exists C, a > such that 

logS(iV£) < log Z (ft) + 7VCe a . (52) 

with high probability. 

The lower bound on logE(Ne) is proved analogously by taking ft such that 2C 2 e~ b ^ + 2C 3 N~ 1+S = e 
thus getting logS(iVe) > log Z(p 2 ) — NCe a whp. One concludes by bounding the difference of the two 
partition functions: | logZ(ft) - logZ(ft)| < f7(ft)|ft - ft| < NCe a whp. □ 
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